{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pinecone-io/examples/blob/master/learn/generation/langchain/v1/xml-agents.ipynb) [![Open nbviewer](https://raw.githubusercontent.com/pinecone-io/examples/master/assets/nbviewer-shield.svg)](https://nbviewer.org/github/pinecone-io/examples/blob/master/learn/generation/langchain/v1/xml-agents.ipynb)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "1eOnr6z_zLoc",
        "outputId": "67999690-7f39-48c2-99d7-bf841f26e3d9"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Python 3.10.12\n"
          ]
        }
      ],
      "source": [
        "!python --version"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "qQG4iSxVxw8f"
      },
      "source": [
        "# XML Agents with RAG and LangChain v1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ov6TCS7bx1oI"
      },
      "source": [
        "LangChain v1 brought a lot of changes and when comparing the LangChain of versions `0.0.3xx` to `0.1.x` there's plenty of changes to the preferred way of doing things. That is very much the case for agents.\n",
        "\n",
        "The way that we initialize and use agents is generally clearer than it was in the past \u2014 there are still many abstractions, but we can (and are encouraged to) get closer to the agent logic itself. This can make for some confusion at first, but once understood the new logic can be much clearer than with previous versions.\n",
        "\n",
        "In this example, we'll be building a RAG agent with LangChain v1. We will use Claude 2.1 for our LLM, Cohere's embed v3 model for knowledge embeddings, and Pinecone to power our knowledge retrieval.\n",
        "\n",
        "To begin, let's install the prerequisites:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "id": "zshhLDrgbFKk"
      },
      "outputs": [],
      "source": [
        "!pip install -qU \\\n",
        "    langchain==0.1.1 \\\n",
        "    langchain-community==0.0.13 \\\n",
        "    langchainhub==0.1.14 \\\n",
        "    anthropic==0.14.0 \\\n",
        "    cohere==4.45 \\\n",
        "    pinecone-client==3.1.0 \\\n",
        "    datasets==2.16.1"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bpKfZkUYzQhB"
      },
      "source": [
        "## Finding Knowledge"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JDTQoxcNzUa8"
      },
      "source": [
        "The first thing we need for an agent using RAG is somewhere we want to pull knowledge from. We will use v2 of the AI ArXiv dataset, available on Hugging Face Datasets at [`jamescalam/ai-arxiv2-chunks`](https://huggingface.co/datasets/jamescalam/ai-arxiv2-chunks).\n",
        "\n",
        "_Note: we're using the prechunked dataset. For the raw version see [`jamescalam/ai-arxiv2`](https://huggingface.co/datasets/jamescalam/ai-arxiv2)._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "U9gpYFnzbFKm",
        "outputId": "6c85bc8e-58ce-49d0-9027-0151ebdea0e1"
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: \n",
            "The secret `HF_TOKEN` does not exist in your Colab secrets.\n",
            "To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.\n",
            "You will be able to reuse this secret in all of your notebooks.\n",
            "Please note that authentication is recommended but still optional to access public models or datasets.\n",
            "  warnings.warn(\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "Dataset({\n",
              "    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],\n",
              "    num_rows: 20000\n",
              "})"
            ]
          },
          "execution_count": 3,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "dataset = load_dataset(\"jamescalam/ai-arxiv2-chunks\", split=\"train[:20000]\")\n",
        "dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "_bP7ZW-ybFKm",
        "outputId": "77253d1f-05f0-4601-ddd0-7d15fbcee793"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'doi': '2401.09350',\n",
              " 'chunk-id': 1,\n",
              " 'chunk': 'These neural networks and their training algorithms may be complex, and the scope of their impact broad and wide, but nonetheless they are simply functions in a high-dimensional space. A trained neural network takes a vector as input, crunches and transforms it in various ways, and produces another vector, often in some other space. An image may thereby be turned into a vector, a song into a sequence of vectors, and a social network as a structured collection of vectors. It seems as though much of human knowledge, or at least what is expressed as text, audio, image, and video, has a vector representation in one form or another.\\nIt should be noted that representing data as vectors is not unique to neural networks and deep learning. In fact, long before learnt vector representations of pieces of data\u00e2\\x80\\x94what is commonly known as \u00e2\\x80\\x9cembeddings\u00e2\\x80\\x9d\u00e2\\x80\\x94came along, data was often encoded as hand-crafted feature vectors. Each feature quanti- fied into continuous or discrete values some facet of the data that was deemed relevant to a particular task (such as classification or regression). Vectors of that form, too, reflect our understanding of a real-world object or concept.',\n",
              " 'id': '2401.09350#1',\n",
              " 'title': 'Foundations of Vector Retrieval',\n",
              " 'summary': 'Vectors are universal mathematical objects that can represent text, images,\\nspeech, or a mix of these data modalities. That happens regardless of whether\\ndata is represented by hand-crafted features or learnt embeddings. Collect a\\nlarge enough quantity of such vectors and the question of retrieval becomes\\nurgently relevant: Finding vectors that are more similar to a query vector.\\nThis monograph is concerned with the question above and covers fundamental\\nconcepts along with advanced data structures and algorithms for vector\\nretrieval. In doing so, it recaps this fascinating topic and lowers barriers of\\nentry into this rich area of research.',\n",
              " 'source': 'http://arxiv.org/pdf/2401.09350',\n",
              " 'authors': 'Sebastian Bruch',\n",
              " 'categories': 'cs.DS, cs.IR',\n",
              " 'comment': None,\n",
              " 'journal_ref': None,\n",
              " 'primary_category': 'cs.DS',\n",
              " 'published': '20240117',\n",
              " 'updated': '20240117',\n",
              " 'references': []}"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset[1]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "VX6NdQhgbFKn"
      },
      "source": [
        "## Building the Knowledge Base"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MDCbqQl_bFKn"
      },
      "source": [
        "To build our knowledge base we need _two things_:\n",
        "\n",
        "1. Embeddings, for this we will use `CohereEmbeddings` using Cohere's embedding models, which do need an [API key](https://dashboard.cohere.com/api-keys).\n",
        "2. A vector database, where we store our embeddings and query them. We use Pinecone which again requires a [free API key](https://app.pinecone.io).\n",
        "\n",
        "First we initialize our connection to Cohere and define an `embed` helper function:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "id": "PzBQ_iE6bFKn"
      },
      "outputs": [],
      "source": [
        "import os\n",
        "from getpass import getpass\n",
        "\n",
        "cohere_key = os.getenv(\"COHERE_API_KEY\") or getpass(\"Cohere API key: \")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "id": "wkw0KyLRbFKo"
      },
      "outputs": [],
      "source": [
        "from langchain_community.embeddings import CohereEmbeddings\n",
        "\n",
        "embed = CohereEmbeddings(model=\"embed-english-v3.0\", cohere_api_key=cohere_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "LhDzfsczbFKo"
      },
      "source": [
        "Then we initialize our connection to Pinecone:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "id": "j0N7EcJibFKo"
      },
      "outputs": [],
      "source": [
        "from pinecone import Pinecone\n",
        "\n",
        "# initialize connection to pinecone (get API key at app.pinecone.io)\n",
        "api_key = os.getenv(\"PINECONE_API_KEY\") or getpass(\"Pinecone API key: \")\n",
        "\n",
        "# configure client\n",
        "pc = Pinecone(api_key=api_key)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g65RLGIpbFKo"
      },
      "source": [
        "Now we setup our index specification, this allows us to define the cloud provider and region where we want to deploy our index. You can find a list of all [available providers and regions here](https://docs.pinecone.io/docs/projects)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "id": "8stIZYKdbFKo"
      },
      "outputs": [],
      "source": [
        "from pinecone import ServerlessSpec\n",
        "\n",
        "spec = ServerlessSpec(\n",
        "    cloud=\"aws\", region=\"us-west-2\"\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "-8Ep3743bFKo"
      },
      "source": [
        "Before creating an index, we need the dimensionality of our Cohere embedding model, which we can find easily by creating an embedding and checking the length:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "DwMhLWLDbFKo",
        "outputId": "63ce2962-d8f5-48c0-fa26-f065e33854e5"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "1024"
            ]
          },
          "execution_count": 10,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "vec = embed.embed_documents([\"ello\"])\n",
        "len(vec[0])\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G3X7nZIabFKp"
      },
      "source": [
        "Now we create the index using our embedding dimensionality, and a metric also compatible with the model (this can be either cosine or dotproduct). We also pass our spec to index initialization."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "E6Bl7xTJbFKp",
        "outputId": "6bdfb7b6-ed1d-4dfb-fe6c-eace78a41fd0"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'dimension': 1024,\n",
              " 'index_fullness': 0.0,\n",
              " 'namespaces': {},\n",
              " 'total_vector_count': 0}"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "import time\n",
        "\n",
        "index_name = \"xml-agents\"\n",
        "\n",
        "# check if index already exists (it shouldn't if this is first time)\n",
        "if index_name not in pc.list_indexes().names():\n",
        "    # if does not exist, create index\n",
        "    pc.create_index(\n",
        "        index_name,\n",
        "        dimension=len(vec[0]),  # dimensionality of cohere v3\n",
        "        metric='dotproduct',\n",
        "        spec=spec\n",
        "    )\n",
        "    # wait for index to be initialized\n",
        "    while not pc.describe_index(index_name).status['ready']:\n",
        "        time.sleep(1)\n",
        "\n",
        "# connect to index\n",
        "index = pc.Index(index_name)\n",
        "time.sleep(1)\n",
        "# view index stats\n",
        "index.describe_index_stats()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "6ZUn2lu7bFKp"
      },
      "source": [
        "### Populating our Index"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "PeVD6d0sbFKp"
      },
      "source": [
        "Now our knowledge base is ready to be populated with our data. We will use the `embed` helper function to embed our documents and then add them to our index.\n",
        "\n",
        "We will also include metadata from each record."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 49,
          "referenced_widgets": [
            "0c54f36fde0f48b1971bed4fd8f17c25",
            "cff5612716b8491f9e7e319d037a4532",
            "2ba98509dedb4a0ca6a8b50275e892f3",
            "36ee54cc891f45e699eb23c63c3bae28",
            "b44c08bcc5604e259c4b7c54a8ab53e7",
            "5d225cd9d5384c8c95744b8facdb174d",
            "aaab14fba4a548848fa731dd2235c8f5",
            "704efb62e6054a608fe93f2a3fc9b587",
            "55c292a3d8cb40239e122a764b581e29",
            "2b6a9a456f264bd8be154d980116d85a",
            "bb55cfbf693d4b7a988f5395826e5e3e"
          ]
        },
        "id": "hb00VSTqbFKp",
        "outputId": "902da9d1-4d27-48a0-93c9-8cd917c5b92d"
      },
      "outputs": [
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "0c54f36fde0f48b1971bed4fd8f17c25",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  0%|          | 0/200 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        }
      ],
      "source": [
        "from tqdm.auto import tqdm\n",
        "\n",
        "# easier to work with dataset as pandas dataframe\n",
        "data = dataset.to_pandas()\n",
        "\n",
        "batch_size = 100\n",
        "\n",
        "for i in tqdm(range(0, len(data), batch_size)):\n",
        "    i_end = min(len(data), i+batch_size)\n",
        "    # get batch of data\n",
        "    batch = data.iloc[i:i_end]\n",
        "    # generate unique ids for each chunk\n",
        "    ids = [f\"{x['doi']}-{x['chunk-id']}\" for i, x in batch.iterrows()]\n",
        "    # get text to embed\n",
        "    texts = [x['chunk'] for _, x in batch.iterrows()]\n",
        "    # embed text\n",
        "    embeds = embed.embed_documents(texts)\n",
        "    # get metadata to store in Pinecone\n",
        "    metadata = [\n",
        "        {'text': x['chunk'],\n",
        "         'source': x['source'],\n",
        "         'title': x['title']} for i, x in batch.iterrows()\n",
        "    ]\n",
        "    # add to Pinecone\n",
        "    index.upsert(vectors=zip(ids, embeds, metadata))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "z6VVT3X_EMDO"
      },
      "source": [
        "Create a tool for our agent to use when searching for ArXiv papers:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "id": "X9J5jHKcEQz6"
      },
      "outputs": [],
      "source": [
        "from langchain.agents import tool\n",
        "\n",
        "@tool\n",
        "def arxiv_search(query: str) -> str:\n",
        "    \"\"\"Use this tool when answering questions about AI, machine learning, data\n",
        "    science, or other technical questions that may be answered using arXiv\n",
        "    papers.\n",
        "    \"\"\"\n",
        "    # create query vector\n",
        "    xq = embed.embed_query(query)\n",
        "    # perform search\n",
        "    out = index.query(vector=xq, top_k=5, include_metadata=True)\n",
        "    # reformat results into string\n",
        "    results_str = \"\\n\\n\".join(\n",
        "        [x[\"metadata\"][\"text\"] for x in out[\"matches\"]]\n",
        "    )\n",
        "    return results_str\n",
        "\n",
        "tools = [arxiv_search]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "uN7d_4r-JMPW"
      },
      "source": [
        "When this tool is used by our agent it will execute it like so:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "eq4H-2RpI1U3",
        "outputId": "eea6c5e6-0a49-4be2-8a58-2f069191f07a"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Ethical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2\u00e2\u0080\u0099s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-user-guide\n",
            "Table 52: Model card for Llama 2.\n",
            "77\n",
            "\n",
            "Model Developers Meta AI Variations Llama 2 comes in a range of parameter sizes\u00e2\u0080\u00947B, 13B, and 70B\u00e2\u0080\u0094as well as pretrained and \u00ef\u00ac\u0081ne-tuned variations. Input Models input text only. Output Models generate text only. Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised \u00ef\u00ac\u0081ne-tuning (SFT) and reinforce- ment learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Model Dates Llama 2 was trained between January 2023 and July 2023. Status This is a static model trained on an o\u00ef\u00ac\u0084ine dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. License Where to send com- ments A custom commercial models-and-libraries/llama-downloads/ Instructions on how to provide feedback or comments on the model can be found in the model README, or by opening an issue in the GitHub repository (https://github.com/facebookresearch/llama/). license is available at: ai.meta.com/resources/ Intended Use Intended Use Cases Llama 2 is intended for commercial and research use in\n",
            "\n",
            "We believe that the open release of LLMs, when done safely, will be a net bene\u00ef\u00ac\u0081t to society. Like all LLMs, Llama 2 is a new technology that carries potential risks with use (Bender et al., 2021b; Weidinger et al., 2021; Solaiman et al., 2023). Testing conducted to date has been in English and has not \u00e2\u0080\u0094 and could not \u00e2\u0080\u0094 cover all scenarios. Therefore, before deploying any applications of Llama 2-Chat, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. We provide a responsible use guide\u00c2\u00b6 and code examples\u00e2\u0080\u0096 to facilitate the safe deployment of Llama 2 and Llama 2-Chat. More details of our responsible release strategy can be found in Section 5.3.\n",
            "The remainder of this paper describes our pretraining methodology (Section 2), \u00ef\u00ac\u0081ne-tuning methodology (Section 3), approach to model safety (Section 4), key observations and insights (Section 5), relevant related work (Section 6), and conclusions (Section 7).\n",
            "\u00e2\u0080\u00a1https://ai.meta.com/resources/models-and-libraries/llama/ \u00c2\u00a7We are delaying the release of the 34B model due to a lack of time to su\u00ef\u00ac\u0083ciently red team. \u00c2\u00b6https://ai.meta.com/llama \u00e2\u0080\u0096https://github.com/facebookresearch/llama\n",
            "4\n",
            "\n",
            "We are releasing the following models to the general public for research and commercial use\u00e2\u0080\u00a1:\n",
            "1. Llama 2, an updated version of Llama 1, trained on a new mix of publicly available data. We also increased the size of the pretraining corpus by 40%, doubled the context length of the model, and adopted grouped-query attention (Ainslie et al., 2023). We are releasing variants of Llama 2 with 7B, 13B, and 70B parameters. We have also trained 34B variants, which we report on in this paper but are not releasing.\u00c2\u00a7\n",
            "2. Llama 2-Chat, a \u00ef\u00ac\u0081ne-tuned version of Llama 2 that is optimized for dialogue use cases. We release variants of this model with 7B, 13B, and 70B parameters as well.\n",
            "\n",
            "In this work, we develop and release Llama 2, a family of pretrained and \u00ef\u00ac\u0081ne-tuned LLMs, Llama 2 and Llama 2-Chat, at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested, Llama 2-Chat models generally perform better than existing open-source models. They also appear to be on par with some of the closed-source models, at least on the human evaluations we performed (see Figures 1 and 3). We have taken measures to increase the safety of these models, using safety-speci\u00ef\u00ac\u0081c data annotation and tuning, as well as conducting red-teaming and employing iterative evaluations. Additionally, this paper contributes a thorough description of our \u00ef\u00ac\u0081ne-tuning methodology and approach to improving LLM safety. We hope that this openness will enable the community to reproduce \u00ef\u00ac\u0081ne-tuned LLMs and continue to improve the safety of those models, paving the way for more responsible development of LLMs. We also share novel observations we made during the development of Llama 2 and Llama 2-Chat, such as the emergence of tool usage and temporal organization of knowledge.\n",
            "3\n"
          ]
        }
      ],
      "source": [
        "print(\n",
        "    arxiv_search.run(tool_input={\"query\": \"can you tell me about llama 2?\"})\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XUvJOqrNhYIh"
      },
      "source": [
        "## Defining XML Agent"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "s45dwd78hbvk"
      },
      "source": [
        "The XML agent is built primarily to support Anthropic models. Anthropic models have been trained to use XML tags like `<input>{some input}</input` or when using a tool they use:\n",
        "\n",
        "```\n",
        "<tool>{tool name}</tool>\n",
        "<tool_input>{tool input}</tool_input>\n",
        "```\n",
        "\n",
        "This is much different to the format produced by typical ReAct agents, which is not as well supported by Anthropic models.\n",
        "\n",
        "To create an XML agent we need a `prompt`, `llm`, and list of `tools`. We can download a prebuilt prompt for conversational XML agents from LangChain hub."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "ntuT7UuXeMz0",
        "outputId": "afa6e6dd-d12c-43e1-8b07-0a161c15d66c"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "ChatPromptTemplate(input_variables=['agent_scratchpad', 'input', 'tools'], partial_variables={'chat_history': ''}, messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['agent_scratchpad', 'chat_history', 'input', 'tools'], template=\"You are a helpful assistant. Help the user answer any questions.\\n\\nYou have access to the following tools:\\n\\n{tools}\\n\\nIn order to use a tool, you can use <tool></tool> and <tool_input></tool_input> tags. You will then get back a response in the form <observation></observation>\\nFor example, if you have a tool called 'search' that could run a google search, in order to search for the weather in SF you would respond:\\n\\n<tool>search</tool><tool_input>weather in SF</tool_input>\\n<observation>64 degrees</observation>\\n\\nWhen you are done, respond with a final answer between <final_answer></final_answer>. For example:\\n\\n<final_answer>The weather in SF is 64 degrees</final_answer>\\n\\nBegin!\\n\\nPrevious Conversation:\\n{chat_history}\\n\\nQuestion: {input}\\n{agent_scratchpad}\"))])"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "from langchain import hub\n",
        "\n",
        "prompt = hub.pull(\"hwchase17/xml-agent-convo\")\n",
        "prompt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "rfdcKCdwi0SL"
      },
      "source": [
        "We can see the XML format being used throughout the prompt when explaining to the LLM how it should use tools."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {
        "id": "kDHuU2uOdW91"
      },
      "outputs": [],
      "source": [
        "from langchain_community.chat_models import ChatAnthropic\n",
        "\n",
        "anthropic_api_key = os.getenv(\"ANTHROPIC_API_KEY\") or getpass(\"Anthropic API key: \")\n",
        "\n",
        "# chat completion llm\n",
        "llm = ChatAnthropic(\n",
        "    anthropic_api_key=anthropic_api_key,\n",
        "    model_name='claude-2.1',\n",
        "    temperature=0.0\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "g33Nt-xijPKG"
      },
      "source": [
        "When the agent is run we will provide it with a single `input` \u2014 this is the input text from a user. However, within the agent logic an *agent_scratchpad* object will be passed too, which will include tool information. To feed this information into our LLM we will need to transform it into the XML format described above, we define the `convert_intermediate_steps` function to handle that."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {
        "id": "TMMBgMBlIJoq"
      },
      "outputs": [],
      "source": [
        "def convert_intermediate_steps(intermediate_steps):\n",
        "    log = \"\"\n",
        "    for action, observation in intermediate_steps:\n",
        "        log += (\n",
        "            f\"<tool>{action.tool}</tool><tool_input>{action.tool_input}\"\n",
        "            f\"</tool_input><observation>{observation}</observation>\"\n",
        "        )\n",
        "    return log"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T5_PQWVckAOi"
      },
      "source": [
        "We must also parse the tools into a string containing `tool_name: tool_description` \u2014 we handle that with the `convert_tools` function."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {
        "id": "qxbrF5a4j9il"
      },
      "outputs": [],
      "source": [
        "def convert_tools(tools):\n",
        "    return \"\\n\".join([f\"{tool.name}: {tool.description}\" for tool in tools])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SCVI2dyUIRg6"
      },
      "source": [
        "With everything ready we can go ahead and initialize our agent object using [**L**ang**C**hain **E**xpression **L**anguage (LCEL)](https://www.pinecone.io/learn/series/langchain/langchain-expression-language/). We add instructions for when the LLM should _stop_ generating with `llm.bind(stop=[...])` and finally we parse the output from the agent using an `XMLAgentOutputParser` object."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {
        "id": "Z3yhTDmEIU4n"
      },
      "outputs": [],
      "source": [
        "from langchain.agents.output_parsers import XMLAgentOutputParser\n",
        "\n",
        "agent = (\n",
        "    {\n",
        "        \"input\": lambda x: x[\"input\"],\n",
        "        # without \"chat_history\", tool usage has no context of prev interactions\n",
        "        \"chat_history\": lambda x: x[\"chat_history\"],\n",
        "        \"agent_scratchpad\": lambda x: convert_intermediate_steps(\n",
        "            x[\"intermediate_steps\"]\n",
        "        ),\n",
        "    }\n",
        "    | prompt.partial(tools=convert_tools(tools))\n",
        "    | llm.bind(stop=[\"</tool_input>\", \"</final_answer>\"])\n",
        "    | XMLAgentOutputParser()\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MG2_hL4hkudq"
      },
      "source": [
        "With our `agent` object initialized we pass it to an `AgentExecutor` object alongside our original `tools` list:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {
        "id": "YHW_K3WOIsXw"
      },
      "outputs": [],
      "source": [
        "from langchain.agents import AgentExecutor\n",
        "\n",
        "agent_executor = AgentExecutor(\n",
        "    agent=agent, tools=tools, verbose=True\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QRCtHauRlkLc"
      },
      "source": [
        "Now we can use the agent via the `invoke` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "Y_Aqp20qloj7",
        "outputId": "2bc17c74-775d-4dd7-f491-4e413cb9e929"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m <tool>arxiv_search</tool><tool_input>llama 2\u001b[0m\u001b[36;1m\u001b[1;3mEthical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2\u00e2\u0080\u0099s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-user-guide\n",
            "Table 52: Model card for Llama 2.\n",
            "77\n",
            "\n",
            "Model Developers Meta AI Variations Llama 2 comes in a range of parameter sizes\u00e2\u0080\u00947B, 13B, and 70B\u00e2\u0080\u0094as well as pretrained and \u00ef\u00ac\u0081ne-tuned variations. Input Models input text only. Output Models generate text only. Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised \u00ef\u00ac\u0081ne-tuning (SFT) and reinforce- ment learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Model Dates Llama 2 was trained between January 2023 and July 2023. Status This is a static model trained on an o\u00ef\u00ac\u0084ine dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. License Where to send com- ments A custom commercial models-and-libraries/llama-downloads/ Instructions on how to provide feedback or comments on the model can be found in the model README, or by opening an issue in the GitHub repository (https://github.com/facebookresearch/llama/). license is available at: ai.meta.com/resources/ Intended Use Intended Use Cases Llama 2 is intended for commercial and research use in\n",
            "\n",
            "# GenAI, Meta\n",
            "# Abstract\n",
            "In this work, we develop and release Llama 2, a collection of pretrained and \u00ef\u00ac\u0081ne-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our \u00ef\u00ac\u0081ne-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed- source models. We provide a detailed description of our approach to \u00ef\u00ac\u0081ne-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.\n",
            "\u00e2\u0088\u0097Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com \u00e2\u0080\u00a0Second author\n",
            "Contributions for all the authors can be found in Section A.1.\n",
            "# Contents\n",
            "# 1 Introduction\n",
            "\n",
            "LLaMA-2 LLaMA-2 (Touvron et al., 2023b) consists of a series of base language models with a parameter count ranging from 7 billion to 70 billion. These base models are solely trained to opti- mize the likelihood of next-word prediction in the language modeling task. For a fair comparison, we employ the same prompt for LLaMA-2 as used for Dromedary-2.\n",
            "LLaMA-2-Chat LLaMA-2-Chat (Touvron et al., 2023b) is an adaptation tailored for dialogue applications. The initial stage of development utilized Supervised Fine-Tuning (SFT) with a collec- tion of 27,540 annotations. For reward modeling, the new human preference annotations for safety and helpfulness reached a count of 1,418,091. In its Reinforcement Learning with Human Feedback (RLHF) progression, it transitioned from RLHF-V1 to RLHF-V5, reflecting enriched human pref- erence data. The model predominantly employed Rejection Sampling fine-tuning up to RLHF-V4. Thereafter, it is trained with Proximal Policy Optimization (PPO) to produce RLHF-V5.\n",
            "\n",
            "In this work, we develop and release Llama 2, a family of pretrained and \u00ef\u00ac\u0081ne-tuned LLMs, Llama 2 and Llama 2-Chat, at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested, Llama 2-Chat models generally perform better than existing open-source models. They also appear to be on par with some of the closed-source models, at least on the human evaluations we performed (see Figures 1 and 3). We have taken measures to increase the safety of these models, using safety-speci\u00ef\u00ac\u0081c data annotation and tuning, as well as conducting red-teaming and employing iterative evaluations. Additionally, this paper contributes a thorough description of our \u00ef\u00ac\u0081ne-tuning methodology and approach to improving LLM safety. We hope that this openness will enable the community to reproduce \u00ef\u00ac\u0081ne-tuned LLMs and continue to improve the safety of those models, paving the way for more responsible development of LLMs. We also share novel observations we made during the development of Llama 2 and Llama 2-Chat, such as the emergence of tool usage and temporal organization of knowledge.\n",
            "3\u001b[0m\u001b[32;1m\u001b[1;3m <final_answer>\n",
            "Based on the information provided, Llama 2 is a series of large language models developed by Meta AI ranging from 7 billion to 70 billion parameters. Llama 2-Chat is a fine-tuned version optimized for dialog applications using supervised learning and reinforcement learning with human feedback. It aims to be helpful, safe, and on par with closed-source chat models. Details are provided on the model architecture, training methodology, and approach to improving safety. The goal is to enable open and responsible development of large language models.\n",
            "\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n"
          ]
        },
        {
          "data": {
            "text/plain": [
              "{'input': 'can you tell me about llama 2?',\n",
              " 'chat_history': '',\n",
              " 'output': '\\nBased on the information provided, Llama 2 is a series of large language models developed by Meta AI ranging from 7 billion to 70 billion parameters. Llama 2-Chat is a fine-tuned version optimized for dialog applications using supervised learning and reinforcement learning with human feedback. It aims to be helpful, safe, and on par with closed-source chat models. Details are provided on the model architecture, training methodology, and approach to improving safety. The goal is to enable open and responsible development of large language models.\\n'}"
            ]
          },
          "execution_count": 21,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "agent_executor.invoke({\n",
        "    \"input\": \"can you tell me about llama 2?\",\n",
        "    \"chat_history\": \"\"\n",
        "})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Eae-JyUFl--N"
      },
      "source": [
        "That looks pretty good, but right now our agent is _stateless_ \u2014 making it hard to have a conversation with. We can give it memory in many different ways, but one the easiest ways to do so is to use `ConversationBufferWindowMemory`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {
        "id": "EqOMNQUfmOEr"
      },
      "outputs": [],
      "source": [
        "from langchain.chains.conversation.memory import ConversationBufferWindowMemory\n",
        "\n",
        "# conversational memory\n",
        "conversational_memory = ConversationBufferWindowMemory(\n",
        "    memory_key='chat_history',\n",
        "    k=5,\n",
        "    return_messages=True\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BvpyjfUwnBLx"
      },
      "source": [
        "Initially we have no `\"chat_history\"` so we will pass an empty string to our `invoke` method:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KpKMRBMimEOt",
        "outputId": "31a36561-cf99-4f88-e3bc-c651ad3a5358"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m <final_answer>Hello! I'm here to try to help answer any questions you may have. Feel free to ask me something and I'll do my best to provide a helpful response.\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n"
          ]
        }
      ],
      "source": [
        "user_msg = \"hello mate\"\n",
        "\n",
        "out = agent_executor.invoke({\n",
        "    \"input\": \"hello mate\",\n",
        "    \"chat_history\": \"\"\n",
        "})"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "O9UvnBrsnNVw"
      },
      "source": [
        "We haven't attached our conversational memory to our agent \u2014 so the `conversational_memory` object will remain empty:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "KJZNIDAslNoC",
        "outputId": "3aff3b9b-8c10-47c5-9fe9-5582a4ba0502"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "execution_count": 24,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "conversational_memory.chat_memory.messages"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ynX9Wca6nawr"
      },
      "source": [
        "We must manually add the interactions between ourselves and the agent to our memory."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "-5hXy1FAnne3",
        "outputId": "c80251db-30be-4dfc-b590-4ab5eb36482f"
      },
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[HumanMessage(content='hello mate'),\n",
              " AIMessage(content=\"Hello! I'm here to try to help answer any questions you may have. Feel free to ask me something and I'll do my best to provide a helpful response.\")]"
            ]
          },
          "execution_count": 25,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "conversational_memory.chat_memory.add_user_message(user_msg)\n",
        "conversational_memory.chat_memory.add_ai_message(out[\"output\"])\n",
        "\n",
        "conversational_memory.chat_memory.messages"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "T3pA0o6ZnrAl"
      },
      "source": [
        "Now we can see that _two_ messages have been added, our `HumanMessage` the the agent's `AIMessage` response. Unfortunately, we cannot send these messages to our XML agent directly. Instead, we need to pass a string in the format:\n",
        "\n",
        "```\n",
        "Human: {human message}\n",
        "AI: {AI message}\n",
        "```\n",
        "\n",
        "Let's write a quick `memory2str` helper function to handle this for us:"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {
        "id": "raZHBdJmtGH-"
      },
      "outputs": [],
      "source": [
        "from langchain_core.messages.human import HumanMessage\n",
        "\n",
        "def memory2str(memory: ConversationBufferWindowMemory):\n",
        "    messages = memory.chat_memory.messages\n",
        "    memory_list = [\n",
        "        f\"Human: {mem.content}\" if isinstance(mem, HumanMessage) \\\n",
        "        else f\"AI: {mem.content}\" for mem in messages\n",
        "    ]\n",
        "    memory_str = \"\\n\".join(memory_list)\n",
        "    return memory_str"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "t89yX-6i3hvd",
        "outputId": "496db0fa-a5e5-4494-8645-ba6598dec95e"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Human: hello mate\n",
            "AI: Hello! I'm here to try to help answer any questions you may have. Feel free to ask me something and I'll do my best to provide a helpful response.\n"
          ]
        }
      ],
      "source": [
        "print(memory2str(conversational_memory))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q0L_80WrpWqd"
      },
      "source": [
        "Now let's put together another helper function called `chat` to help us handle the _state_ part of our agent."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {
        "id": "C-Ck2Lv53rD-"
      },
      "outputs": [],
      "source": [
        "def chat(text: str):\n",
        "    out = agent_executor.invoke({\n",
        "        \"input\": text,\n",
        "        \"chat_history\": memory2str(conversational_memory)\n",
        "    })\n",
        "    conversational_memory.chat_memory.add_user_message(text)\n",
        "    conversational_memory.chat_memory.add_ai_message(out[\"output\"])\n",
        "    return out[\"output\"]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XIheLeTBsO9S"
      },
      "source": [
        "Now we simply chat with our agent and it will remember the context of previous interactions."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "iJ_PH7YcA_f2",
        "outputId": "0de54a11-5abc-4db0-8ea8-62aa78589344"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m <tool>arxiv_search</tool><tool_input>llama 2\u001b[0m\u001b[36;1m\u001b[1;3mEthical Considerations and Limitations (Section 5.2) Llama 2 is a new technology that carries risks with use. Testing conducted to date has been in English, and has not covered, nor could it cover all scenarios. For these reasons, as with all LLMs, Llama 2\u00e2\u0080\u0099s potential outputs cannot be predicted in advance, and the model may in some instances produce inaccurate or objectionable responses to user prompts. Therefore, before deploying any applications of Llama 2, developers should perform safety testing and tuning tailored to their speci\u00ef\u00ac\u0081c applications of the model. Please see the Responsible Use Guide available available at https://ai.meta.com/llama/responsible-user-guide\n",
            "Table 52: Model card for Llama 2.\n",
            "77\n",
            "\n",
            "Model Developers Meta AI Variations Llama 2 comes in a range of parameter sizes\u00e2\u0080\u00947B, 13B, and 70B\u00e2\u0080\u0094as well as pretrained and \u00ef\u00ac\u0081ne-tuned variations. Input Models input text only. Output Models generate text only. Model Architecture Llama 2 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised \u00ef\u00ac\u0081ne-tuning (SFT) and reinforce- ment learning with human feedback (RLHF) to align to human preferences for helpfulness and safety. Model Dates Llama 2 was trained between January 2023 and July 2023. Status This is a static model trained on an o\u00ef\u00ac\u0084ine dataset. Future versions of the tuned models will be released as we improve model safety with community feedback. License Where to send com- ments A custom commercial models-and-libraries/llama-downloads/ Instructions on how to provide feedback or comments on the model can be found in the model README, or by opening an issue in the GitHub repository (https://github.com/facebookresearch/llama/). license is available at: ai.meta.com/resources/ Intended Use Intended Use Cases Llama 2 is intended for commercial and research use in\n",
            "\n",
            "# GenAI, Meta\n",
            "# Abstract\n",
            "In this work, we develop and release Llama 2, a collection of pretrained and \u00ef\u00ac\u0081ne-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our \u00ef\u00ac\u0081ne-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed- source models. We provide a detailed description of our approach to \u00ef\u00ac\u0081ne-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.\n",
            "\u00e2\u0088\u0097Equal contribution, corresponding authors: {tscialom, htouvron}@meta.com \u00e2\u0080\u00a0Second author\n",
            "Contributions for all the authors can be found in Section A.1.\n",
            "# Contents\n",
            "# 1 Introduction\n",
            "\n",
            "LLaMA-2 LLaMA-2 (Touvron et al., 2023b) consists of a series of base language models with a parameter count ranging from 7 billion to 70 billion. These base models are solely trained to opti- mize the likelihood of next-word prediction in the language modeling task. For a fair comparison, we employ the same prompt for LLaMA-2 as used for Dromedary-2.\n",
            "LLaMA-2-Chat LLaMA-2-Chat (Touvron et al., 2023b) is an adaptation tailored for dialogue applications. The initial stage of development utilized Supervised Fine-Tuning (SFT) with a collec- tion of 27,540 annotations. For reward modeling, the new human preference annotations for safety and helpfulness reached a count of 1,418,091. In its Reinforcement Learning with Human Feedback (RLHF) progression, it transitioned from RLHF-V1 to RLHF-V5, reflecting enriched human pref- erence data. The model predominantly employed Rejection Sampling fine-tuning up to RLHF-V4. Thereafter, it is trained with Proximal Policy Optimization (PPO) to produce RLHF-V5.\n",
            "\n",
            "In this work, we develop and release Llama 2, a family of pretrained and \u00ef\u00ac\u0081ne-tuned LLMs, Llama 2 and Llama 2-Chat, at scales up to 70B parameters. On the series of helpfulness and safety benchmarks we tested, Llama 2-Chat models generally perform better than existing open-source models. They also appear to be on par with some of the closed-source models, at least on the human evaluations we performed (see Figures 1 and 3). We have taken measures to increase the safety of these models, using safety-speci\u00ef\u00ac\u0081c data annotation and tuning, as well as conducting red-teaming and employing iterative evaluations. Additionally, this paper contributes a thorough description of our \u00ef\u00ac\u0081ne-tuning methodology and approach to improving LLM safety. We hope that this openness will enable the community to reproduce \u00ef\u00ac\u0081ne-tuned LLMs and continue to improve the safety of those models, paving the way for more responsible development of LLMs. We also share novel observations we made during the development of Llama 2 and Llama 2-Chat, such as the emergence of tool usage and temporal organization of knowledge.\n",
            "3\u001b[0m\u001b[32;1m\u001b[1;3m <final_answer>\n",
            "Llama 2 is a series of large language models developed by Meta AI, with model sizes ranging from 7 billion to 70 billion parameters.\n",
            "\n",
            "The base Llama 2 models are pretrained language models optimized for next word prediction. Llama 2-Chat is a version fine-tuned specifically for dialog applications, using a combination of supervised fine-tuning and reinforcement learning with human feedback.\n",
            "\n",
            "Key points about Llama 2:\n",
            "\n",
            "- Outperforms other open source chat models on helpfulness and safety benchmarks\n",
            "- Comparable to some closed source models on human evaluations \n",
            "- Fine-tuning focused on improving safety as well as performance\n",
            "- Thorough documentation to enable reproducibility and responsible LLM development\n",
            "\n",
            "So in summary, Llama 2 is an open source family of large language models, with Llama 2-Chat being the dialog-optimized version, notable for its combination of strong performance and efforts towards safety.\n",
            "\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n",
            "\n",
            "Llama 2 is a series of large language models developed by Meta AI, with model sizes ranging from 7 billion to 70 billion parameters.\n",
            "\n",
            "The base Llama 2 models are pretrained language models optimized for next word prediction. Llama 2-Chat is a version fine-tuned specifically for dialog applications, using a combination of supervised fine-tuning and reinforcement learning with human feedback.\n",
            "\n",
            "Key points about Llama 2:\n",
            "\n",
            "- Outperforms other open source chat models on helpfulness and safety benchmarks\n",
            "- Comparable to some closed source models on human evaluations \n",
            "- Fine-tuning focused on improving safety as well as performance\n",
            "- Thorough documentation to enable reproducibility and responsible LLM development\n",
            "\n",
            "So in summary, Llama 2 is an open source family of large language models, with Llama 2-Chat being the dialog-optimized version, notable for its combination of strong performance and efforts towards safety.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "print(chat(\"can you tell me about llama 2?\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5p8m4Gc5w1OX"
      },
      "source": [
        "We can ask follow up questions that miss key information but thanks to the conversational history the LLM understands the context and uses that to adjust the search query.\n",
        "\n",
        "_Note: if missing `\"chat_history\"` parameter from the `agent` definition you will likely notice a lack of context in the search term, and in some cases this lack of good information can trigger a `ValueError` during output parsing._"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3XJ_3JIgBDRl",
        "outputId": "d7d23e70-c894-46a6-f990-73eb2135a36c"
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "\n",
            "\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\n",
            "\u001b[32;1m\u001b[1;3m <tool>arxiv_search</tool><tool_input>Llama 2 red team\u001b[0m\u001b[36;1m\u001b[1;3m15\n",
            "pafety Reward Model Scores Distribution on Red Teaming Prompts\n",
            "Responding Model GPT 3.5 Turbo Code Llama 138 Instruct Code Llama 34B Instruct Code Llama 7B Instruct 0.0-+ -0.2 0.0 0.2 0.4 0.6 08 1.0 12 Llama 2 70B Safety Reward Model Score\n",
            "Figure 7: KDE plot of the risk score output by the Llama 2 safety reward model on prompts with clear intent specific to code risk created by red teamers with background in cybersecurity and malware generation.\n",
            "One red teamer remarked, \u00e2\u0080\u009cWhile LLMs being able to iteratively improve on produced source code is a risk, producing source code isn\u00e2\u0080\u0099t the actual gap. That said, LLMs may be risky because they can inform low-skill adversaries in production of scripts through iteration that perform some malicious behavior.\u00e2\u0080\u009d\n",
            "According to another red teamer, \u00e2\u0080\u009c[v]arious scripts, program code, and compiled binaries are readily available on mainstream public websites, hacking forums or on \u00e2\u0080\u0098the dark web.\u00e2\u0080\u0099 Advanced malware development is beyond the current capabilities of available LLMs, and even an advanced LLM paired with an expert malware developer is not particularly useful- as the barrier is not typically writing the malware code itself. That said, these LLMs may produce code which will get easily caught if used directly.\u00e2\u0080\u009d\n",
            "\n",
            "In addition to red teaming sessions, we ran a quantitative evaluation on risk from generating malicious code by scoring Code Llama\u00e2\u0080\u0099s responses to ChatGPT\u00e2\u0080\u0099s (GPT3.5 Turbo) with LLAMAv2 70B\u00e2\u0080\u0099s safety reward model. For this second quantitative evaluation, we selected prompts that the red teamers generated specifically attempting to solicit malicious code (even though the red teaming included consideration of a broad set of safety risks). These prompts were a mix of clear intent and slightly obfuscated intentions (see some examples in Figure 15. We show a KDE plot of the distribution of the safety score for all models in Figure 7). We observe that Code Llama tends to answer with safer responses; the distribution of safety scores for Code Llama has more weight in the safer part of the range.\n",
            "\n",
            "Pushkar Mishra, Igor Moly- bog, Yixin Nie, Andrew Poulton, Jeremy Reizen- stein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subrama- nian, Xiaoqing Ellen Tan, Binh Tang, Ross Tay- lor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Ro- driguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023b. Llama 2: Open foundation and fine-tuned chat models.\n",
            "\n",
            "Pushkar Mishra, Igor Moly- bog, Yixin Nie, Andrew Poulton, Jeremy Reizen- stein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subrama- nian, Xiaoqing Ellen Tan, Binh Tang, Ross Tay- lor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aurelien Ro- driguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. 2023b. Llama 2: Open foundation and fine-tuned chat models.\n",
            "\n",
            "Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiao- qing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aur\u00c3\u00a9lien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023.\u001b[0m\u001b[32;1m\u001b[1;3m Based on the information from the arXiv search, it does appear that some red teaming was done on Llama 2 models:\n",
            "\n",
            "<final_answer>\n",
            "Yes, red teaming was done on Llama 2 models to evaluate risks from generating malicious code. This included both qualitative red teaming sessions as well as quantitative evaluation scoring code generation prompts with a safety reward model. The red teaming and evaluations found that while iterative code improvement remains a risk, Llama 2 tended to respond with safer code compared to other models.\n",
            "\u001b[0m\n",
            "\n",
            "\u001b[1m> Finished chain.\u001b[0m\n",
            "\n",
            "Yes, red teaming was done on Llama 2 models to evaluate risks from generating malicious code. This included both qualitative red teaming sessions as well as quantitative evaluation scoring code generation prompts with a safety reward model. The red teaming and evaluations found that while iterative code improvement remains a risk, Llama 2 tended to respond with safer code compared to other models.\n",
            "\n"
          ]
        }
      ],
      "source": [
        "print(chat(\"was any red teaming done?\"))"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "SelG8OcOxggP"
      },
      "source": [
        "We get a reasonable answer here. It's worth noting that with previous iterations of this test, ie \"llama 2 red teaming\" using the original `ai-arxiv` dataset rarely (if ever) returned directly relevant results."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "e9bI9czPtWnl"
      },
      "source": [
        "---"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "ml",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.9.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}